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In many flowfield computations, accuracy of the turbulence 
model employed is frequently a limiting factor in the overall 
accuracy of the computation. This is particularly true for complex 
flowfields such as those around full aircraft configurations. Free 
shear layers such as wakes, impinging jets, (in V/STOL 

applications), and mixing layers over cavities are often part of these 
flowfields. 

Although flowfields have been computed for full aircraft, the 
memory and CPU requirements for these computations are often 
excessive. Additional computer power is required for multi- 
disciplinary computations such as coupled fluid dynamics and 
conduction heat transfer analysis. Massively parallel computers 
show promise in alleviating this situation, and the purpose of this 
effort was to adapt and optimize CFD codes to these new machines. 

The objective of this research effort was to compute the 
flowfield and heat transfer for a two-dimensional jet impinging 
normally on a cool plate. The results of this research effort were 
summarized in an AIAA Paper titled "Parallel Implementation of the 
k-e Turbulence Model". Appendix A contains the full paper. 
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Abstract 

The k-t turbulence model has been added to a paral- 
lel Navier-Stokes solver on an Intel iPSC/860 massively 
parallel computer. Both a bigh-Reynolds-number model 
with wall functions and the Chien low-Reynolds-number 
model have been implemented. Two flowfields have been 
computed: flow over a flat plate, and the flow and heat 
transfer for a two-dimensional jet impinging normally on 
a cool plate. Considerations specific to implementing 
these models on parallel machines are discussed, and tim- 
ings for the two models are presented. 

Notation 

Roman Symbols 

constant for k - e model (see Table i) 

Cj friction coefficient 

Ci constant for k - c model (see Table 1) 

C 2 constant for k - e model (see Table 1) 

C p specific heat at constant pressure 

Hi source term vector for k — c equations 

J Jacobian of coordinate transformation 

k turbulent kinetic energy 

L reference length 

n normal distance from wall 

V rate of production of turbulent kinetic energy 

Pr Prandtl number 

Q . dependent variable vector 
Re Reynolds number, Re = a^L/u^ 

Reg Reynolds number based on freestream velocity 

and momentum thickness 
St Stanton number, St = q w /pC p T w Uj 

T temperature 

T + temperature in wall coordinates, 

T + = (T - T w ) pC p u. /q w 

u, v , w velocity components in Cartesian coordinate 
directions 

velocity normalized by friction velocity 
u m friction velocity 

<7, V, W contravariant velocity components 

•Research Scientist, Member AIAA. Presently at United Tech- 
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Uj 

jet velocity 


Cartesian coordinates 

y + 

distance from wall in wall coordinates, = 

Greek Symbols 

7 

ratio of specific heats 


Kronecker delta 

€ 

dissipation rate of turbulent kinetic energy 

K 

von Karman constant, 0.41 

X 

VvcJ 

p 

molecular viscosity 

Pt 

turbulent viscosity 

u 

kinematic viscosity 


transformed coordinates 

P 

density 

<?C 

constant for k - c model, 1.3 

O’k 

constant for k - e model, 1.0 

T 

shear stress; transformed time in eqn. (1) 


Subscripts and Superscripts 



tensor indices 

t 

turbulent 

V 

viscous 

W 

wall 


partial differentiation in Cartesian coordinate 
directions 

CO 

freestream condition 

Other Symbols 

o 

vector in transformed coordinates 

ps 

microseconds 


Introduction 

Although the performance of supercomputers has been 
steadily improving, significant increases in computational 
speed are required before the analysis of complex viscous 
flows can be used as a routine design tool. An initia- 
tive has been established, the High Performance Comput- 
ing and Communications (HPCC) program, to accelerate 
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these improvements, primarily through the development 
of hardware and software for massively parallel comput- 
ers. 1 

The HPCC program is divided into two areas, Com- 
putational AeroSciences (CAS) and Earth and Space Sci- 
ence (ESS). The CAS project contains four "grand chal- 
lenge” problems, one of which is the High Performance 
Aircraft (HPA). The HPA grand challenge includes com- 
putations of the flowfieid about a full aircraft undergoing 
a maneuver and of a powered lift aircraft in transition 
from hover to forward flight. 1 

An appropriate turbulence model must be selected for 
these applications, taking into account issues of accuracy 
and computational efficiency. Towards this end, the k-€ 
turbulence model has been added to an existing Navier- 
Stokes solver 2 on an Intel iPSC/860. Both the Chien 
low-Reynolds-number k-€ model 3 and a high-Reynolds- 
number k-e model 4 with wall functions 5 have been im- 
plemented. Flow over a flat plate has been computed in 
order to test the models and to get an initial set of CPU 
times. To test the models on a more realistic geometry 
which is applicable to the HPA grand challenge problem, 
the flow and heat transfer for a 2D planar jet impinging 
on a cool surface was then computed. Relative compu- 
tational speeds of the two turbulence models differ from 
those on serial computers, and the causes of the differ- 
ences are examined and discussed. 

Intel iPSC/860 


All computations in the present study have been car- 
ried out on the Intel iPSC/860 at NASA Ames Research 
Center. The iPSC/860 contains 128 processing nodes in a 
hypercube configuration. Each processing node consists 
of a 40MHz Intel i860 processor and 8MB of memory. 
Input/output is handled by 10 additional nodes (386- 
based processors), each of which has 8MB of memory. 
Ten 760MB disks are available for file storage. The front- 
end machine is an Intel computer with an 80386 processor 
and 8MB of memory. 

On the iPSC/860, the memory for each processing node 
is dedicated to that node, and messages must be passed 
between nodes when data are to be shared. Message pass- 
ing is costly in CPU time, and one of the primary pro- 
gramming tasks is to minimize this cost through careful 
choice of domain decomposition and the use of appropri- 
ate solvers. 
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The nondimensional k-t transport equations in trans- 
formed coordinates are given by 
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Table 1: Constants for k - c modplc 



c, 

C 2 

C 3 

G, 

c u 

<7k 

<7 f 

Chien tow-Re 3 

1.35 

1.8 

0.0115 

0.5 

0.09 

1.0 

1.3 

high-Re 4 

1.44 

1.92 

- 

- 

0.09 

1.0 

1.3 


n + = Re 


u.n 


V 


(19) 


/= 1 



and 


k 2 

= Re C^p— 



( 20 ) 


( 21 ) 


Here, n is the normal distance from the wall. 

The transport equations for the Chien low-Reynolds- 
number model are the same as those for the high- 
Reynolds-number model with the addition of terms to 
account for the effects of solid walls. The low-Reynolds- 
number terms are shown in boxes in the above equations. 
The standard set of constants, shown in Table 1, is used 
for all of the present computations. 


Figure 1: Typical domain decomposition 
Domain Decomposition 



Although the present application involves simple ge- 
ometries, the parallel code was made as general as possi- 
ble. A wall function formulation 5 which is applicable to 
general geometries with wall heat transfer was therefore 
employed. 

Timing Considerations 

In a previous study, 5 2D cases were computed on a 
Cray Y-MP using the present turbulence models. In those 
computations, a block Beam- Warming algorithm was uti- 
lized, which differs from the present diagonalized algo- 
rithm, but the relative timings for the turbulence models 
are still valid. The resulting Y-MP timings were as fol- 
lows: laminar, 35.7 /rs/pt./step; k-e with wail functions, 
49.5 n s/pt./step; and Chien model, 50.9 fts/ pt./step. The 
wall function and Chien cases required 38.7% and 42.6% 
more time respectively compared to the laminar compu- 
tations. The Chien model requires slightly more time 
than wall functions (2.8%) due to additional terms in the 
transport equations and the source term Jacobians. 

Some turbulence models require computation of nor- 
mal distances to walls. This is true for the low-Revnolds- 
number terms in equations ( 17) and (21). This is a* disad- 
vantage for MIMD implementations, since the locations 
of the walls must be passed to other nodes. Distances to 
walls are not required for the k-e model if wall functions 
are employed. One of the purposes of the present study 
is to quantify the effects of these differences on computa- 
tional speed. 


The computational grid is divided into subdomains, 
and each subdomain is assigned to one node. At sub- 
domain surfaces which are internal to the overall domain, 
one plane of data adjacent to each internal surface is re- 
quired for the computation of the right hand side of each 
set of equations. These data may be obtained by passing 
messages between subdomains, or alternatively, an addi- 
tional, overlapping plane may be included at each internal 
subdomain surface. The latter approach requires addi- 
tional memory to store data for the overlapping planes, 
but runs faster, since some message passing is avoided. 
This is the approach taken in the present code. 

A typical domain decomposition, for a flat plate compu- 
tation utilizing 16 nodes, is shown in Fig. 1. In this figure, 
the £ direction is parallel to the plate and the rj direction 
is normal to the plate. Subdomains are represented bv 
boxes, and the corresponding node number is shown in 
each subdomain. The ranges of grid lines contained in 
the nodes are shown on the left side and top of the fig- 
ure. As an example of the grid overlap, data for £ = 23 
through £ = 47 are stored on node 5, even though the 
solution domain for that node is £ = 24 through f = 46. 
Data for ( = 46 through ( = 70 are stored on node 6, etc. 
Additional details of the domain decomposition are given 
by Ryan and Weeratunga. 2 

Solvers 

The Navier-Stokes and k-e equations are solved looselv 
coupled. A second-order, diagonalized, approximate- 
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0 — 1 — 2 — 3 — 2 — I — 0). The load imbalance (idle 
time) which is inherent in PGE is shown by the extent of 
the blank areas. 


Wall Functions 


For the present study, the wall function formulation of 
Sondak and Pletcher 5 has been extended to include flows 
with wall heat transfer. For each grid point adjacent to a 
wall, it is determined whether the point lies in the viscous 
sublayer or in the log region (neglecting the buffer region) 
by computing the value of y + . If the point is in the viscous 
sublayer, the heat transfer rate is computed from 


7 \x de 

^ Pr dxi T 


( 22 ) 


Figure 2: AIMS diagram for scalar tridiagonal solver 

factorization scheme (ARC3D) 6 is employed for the 
Navier-Stokes equations, and a first-order upwind scheme 
with block tridiagonal solvers is used for the k-t equa- 
tions. For the k-€ equations, the source terms are lin- 
earized, and the source term Jacobian is arbitrarily in- 
cluded in a single factor. These equations are coupled 
to one another solely through the source terms, so the 
equations in the two factors which do not include the 
source term Jacobians are uncoupled from one another. 
For these factors, a scalar tridiagonal solver is employed. 
For the factor containing the source term Jacobian, a 2x2 
block tridiagonal solver is employed. 

Both equation sets are solved using Pipelined Gaussian 
Elimination (PGE), since this technique was found to be 
a resonable compromise between CPU time and memory 
requirements by Ryan and Weeratunga' for the Navier- 
Stokes solver. In PGE, a short message is passed across 
the subdomain boundary for each grid point as soon as 
the data are available. 

A tool has been developed at NASA Ames, the Auto- 
mated Instrumentation and Monitoring System (AIMS), 
to help visualize the operation of parallel programs. 
AIMS was utilized to verify the coding of the k-c solvers. 
An example AIMS diagram for the scalar tridiagonal 
solver for the domain decomposition of Fig. 1 is shown 
in Fig. 2. Each horizontal bar represents a processor, 
numbered on the left side of the figure, and the abcissa 
represents time. Processors are in operation in times dur- 
ing which the bars are solid, and are idle at other times, 
typically waiting for messages to arrive. Lines connecting 
processor bars represent messages being passed. The gray 
bars represent computation time spent in the solver, and 
the black bars represent computation time in other sub- 
routines. For this configuration of processors, the forward 
and backward sweeps are clearly shown (e.g., processors 


assuming constant specific heats. 

For points in the log region, the temperature log equa- 
tion 8 is employed: 

T + = -!ny + +.-l (23) 

K 

where T = 4.6 for the standard value Pr, = 0.9. The 
wall heat transfer rale is then given by 

(24) 

The direction of heat transfer is normal to the wall, and 
the magnitude is equal to q Wt so the heat transfer vec- 
tor is completely specified. The heat transfer vector is 
then transformed to the Cartesian coordinate system. 
The Cartesian components are then substituted into the 
right hand side of the energy equation. This method is 
analogous to that used for shear stress by Sondak and 
Pletcher. 5 

Results 

A simple test case was desired which could be used to 
verify that the parallel implementation of the turbulence 
model was working correctly, and to obtain some prelim- 
inary timings. Turbulent, subsonic flow over a flat plate 
was chosen for this purpose. 

One of the eventual goals of the present work is to 
compute the heat transfer and flowfield for the impinging 
jets of a powered lift aircraft in hover for the HPA grand 
challenge. Toward this end, the second test case is the 
flow and heat transfer for a jet impinging normally on a 
flat plate. 

Flat Plate 

Flow over a semi-infinite flat plate was solved using a 
91 x 91 grid. The grid is clustered at the plate surface and 
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(b) Closeup of leading edge region 


Figure 3: Flat plate grid 


at the leading edge, as shown in Fig. 3. A grid spacing of 
1 x 10 5 (normalized by the plate length) at the wall was 
chosen to yield r: 3, keeping the grid point adjacent to 
the wall in the viscous sublayer, as required by the Chien 
model. There are 30 grid lines upstream of the leading 
edge, and 61 on the plate. 

A value of = 0.2 was chosen to maintain a rea- 
sonable convergence rate with essentially incompressible 
flow. The Reynolds number based on freestream velocity 
was approximately equal to 9 x 10 s at the outflow bound- 
ary. A small value of freestream turbulent kinetic en- 
ergy was chosen (*/«*, = 0.00025) based on past experi- 
ence, 5 and the freestream dissipation rate was chosen such 
that Utoo — 1- At the inflow boundary, density, momen- 
tum components, k, and c were set to freestream values, 
and pressure was extrapolated. At the outflow boundary,’ 
pressure was fixed at the freestream value, and density, 
momentum components, and k and i were extrapolated.’ 
On the stagnation streamline, density, streamwise mo- 





Tab 

e 2: Flat plate timings 

no. nodes 

cpu time (ps/pt./step) | 

laminar 

wall fen. 

Chien 

4 

98.9 

161.4 

178.7 

8 

56.5 

89.7 

100.2 

16 

33.5 

52.4 

59.0 

32 

22.3 

33.9 

37.1 

64 

15.1 

21.9 

26". 2 


A semiempirical equation for friction coefficient as 
a function of momentum-thickness Reynolds number is 
given by 8 

Reg = ^3.75— e 0 4 O-8) (25) 

where A = y/2/Cj. The computed friction coefficient 
distributions are compared with this equation in Fig. 4. 

Table 2 shows CPU times for both turbulence models as 
well as laminar flow. The Chien model requires 10-20% 
more CPU time than the high-Reynolds-number model 
with wall functions. In the serial code discussed above, 
the the Chien model only required 2.8% more time. This 
difference is attributable to the additional message pass- 
ing required by the Chien model for computation of the 
damping functions. 


mentum, pressure, k , and e were extrapolated and normal 
momentum was set to zero (inviscid wall). On the plate, 
density and pressure were extrapolated, and momentum 
components were set to zero (no-slip). For the Chien 
model, k and c were set to zero at the plate. For wall 
functions, k and e at the plate are part of the formula- 
tion. At the outer edge, ail quantities were extrapolated. 


Impinging Jet 

The second test case was a slot jet impinging on a cool 
constant-temperature surface, as studied experimentally 
by Schauer and Eustis. 9 The configuration chosen for the 
present computation had a height-to-jet-width ratio of 40. 
a jet Reynolds number of 4.08 x 10 4 , and a jet-to-wail- 
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0.004 



(a) Nozzle region 



(b) Impingment region 


Figure 5: Impinging jet grid 


temperature ratio of 1.105. The ambient temperature was 
equal to the jet temperature. 

Parts of the 121 x 121 grid is shown in Fig. 5. The 
grid spacing at the wall was set to 2 x 10“ 4 for a value of 
m 1. A grid with a wall spacing of 1 x 10~ 3 (yi & 5) 
was tried initially, but it did not have sufficient resolu- 
tion for the thermal boundary layer. The domain is 60 
slot widths wide and 60 slot widths high. The bound- 
ary conditions are as follows: Symmetry is imposed on 
all quantities at the jet centerline. On the surface of the 
channel from which the jet discharges, density and pres- 
sure are extrapolated, and momentum components are set 
to zero (no-slip). For the Chien model, the damping func- 
tions are not applied at this surface in order to minimize 
message- passing. On the plate, momentum components 
are set to zero (no-slip), the temperature is fixed, and 
pressure is extrapolated. For the Chien model, k and c 
are set to zero at the plate. For wall functions, k and c 
at the plate are part of the wall function formulation. At 


0.000 



20 40 

x/d 


60 


Figure 6: Impinging jet heat transfer 


Table 3: Impinging jet timings 


no. nodes 

cpu time (/js/pt. /step) 

laminar 

wall fen. 

Chien 

8 

52.7 

86.7 

95.2 

16 

33.0 

52.3 

58.6 

32 

18.8 

29.8 

32.9 

64 

11.8 

18.6 



the outflow boundary, pressure is fixed to the freestream 
value, and density, momentum components, k , and t are 
extrapolated. At the outer edge, all quantities are ex- 
trapolated. 

The computed flowfieid was unsteady, with vortices 
rolling up at the edge of the jet and traversing the do- 
main. The period of the unsteadiness was determined by 
monitoring the temperature of the flow at a grid point 
adjacent to the plate in the wall jet, and the resulting 
Strouhal number was equal to 0.29. The heat transfer 
rates were determined by averaging the values over every 
time step in one period (about 700 steps). The resulting 
Stanton number distributions are compared with the ex- 
perimental data in Fig. 6. It should be noted that the grid 
was fine enough that the inner layer of the two-layer wall 
function formulation was called into play at all points. 
The temperature log equation could not be tested for the 
present configuration due to the thermal boundary layer 
resolution issue mentioned above. 

A table of timings for the impinging jet case is shown 
in Table 3. The trends are the same as those for the flat 
plate case, as is to be expected. No timings are shown for 
4 nodes because there was not sufficient memory available 
for the 121 x 121 grid. 

Summary and Conclusions 

The k - i turbulence model has been added to a paral- 
lel Navier-Stokes solver on an Intel iPSC/860. Two test 
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cases have been computed: flow over an adiabatic flat 
plate, and an impinging jet flow with wall heat transfer. 
Two wall treatments were used for the k — c model, wall 
functions, and the Chien low-Reynolds-number model. 

The Chien model incurred an average CPU penalty of 
13% compared to the wall function computations (on the 
same grids). Approximately 10% is attributable to mes- 
sage passing required in the calculation of damping func- 
tions. The remaining 3% was also seen in a serial im- 
plementation, and is therefore not a function of message 
passing. 

Some low-Reynolds-number models, such as that of 
Jones and Launder, 10 do not utilize damping functions. 
These models therefore will have an advantage in terms 
of CPU time for parallel applications. 

The impinging jet Stanton number distribution looks 
somewhat better for wall functions than for the Chien 
model, but it should not be concluded that wall functions 
will generally yield better results for this type of flowfield. 
Applications which exercise the temperature log equation 
part of the wall function formulation must be computed 
before any conclusions can be reached. 

After additional validation, the next step will be to cou- 
ple the present code with a thermal conduction solver 1 1 to 
study coupled thermal conduction/convection problems. 
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